AITopics | mixture-of-expert vision transformer

M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

Neural Information Processing SystemsDec-25-2025, 02:01:52 GMT

Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly. Multi-tasking models have become successful and often essential for many sophisticated systems such as autonomous driving and indoor robots. However, when deploying MTL onto those real-world systems that are often resource-constrained or latency-sensitive, two prominent challenges arise: (i) during training, simultaneously optimizing all tasks is often difficult due to gradient conflicts across tasks, and the challenge is amplified when a growing number of tasks have to be squeezed into one compact model; (ii) at inference, current MTL regimes have to activate nearly the entire model even to just execute a single task. Yet most real systems demand only one or two tasks at each moment, while flexibly switching between tasks per need: therefore such "all tasks activated" inference is also highly inefficient and non-scalable in practice. In this paper, we present a model-accelerator co-design framework to enable efficient on-device MTL, that tackles both training and inference bottlenecks.

efficient multi-task learning, mixture-of-expert vision transformer, model-accelerator co-design, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.63)
Information Technology > Artificial Intelligence > Robots (0.57)

Add feedback

Supplementary Materials for M3 ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with M odel-Accelerator Co-design

Neural Information Processing SystemsAug-18-2025, 02:29:15 GMT

The final ViT block's output feature will be fed into decoders for multi-task predictions. The router is a single-layer MLP which maps token embedding to experts' selection probability. The batch size is 16. LUTs, 461K registers, 11 Mbit block RAM, and 27 Mbit UltraRAM. It runs at a clock frequency of 1,395 MHz and consumes 295 W of power.

artificial intelligence, machine learning, vit, (12 more...)

Neural Information Processing Systems

Country: North America > United States > Texas > Travis County > Austin (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

Neural Information Processing SystemsJan-18-2025, 16:07:50 GMT

Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly. Multi-tasking models have become successful and often essential for many sophisticated systems such as autonomous driving and indoor robots. However, when deploying MTL onto those real-world systems that are often resource-constrained or latency-sensitive, two prominent challenges arise: (i) during training, simultaneously optimizing all tasks is often difficult due to gradient conflicts across tasks, and the challenge is amplified when a growing number of tasks have to be squeezed into one compact model; (ii) at inference, current MTL regimes have to activate nearly the entire model even to just execute a single task. Yet most real systems demand only one or two tasks at each moment, while flexibly switching between tasks per need: therefore such "all tasks activated" inference is also highly inefficient and non-scalable in practice. In this paper, we present a model-accelerator co-design framework to enable efficient on-device MTL, that tackles both training and inference bottlenecks.

efficient multi-task learning, mixture-of-expert vision transformer, model-accelerator co-design, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Collaborating Authors

mixture-of-expert vision transformer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

Supplementary Materials for M3 ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with M odel-Accelerator Co-design

M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design